Introduction

Randomness is the mainstay of modern cryptography. Designing ciphers is no trifling task and it is also important how a cipher's security is achieved. Essentially, an encryption scheme consists of three things - an encryption function, a decryption function and a key. One might think that a good way to ensure the cipher cannot be broken is to simply conceal the encryption and decryption process - after all, if the adversary does not know what they are breaking, how can they break it?

Unfortunately, if the cipher does get broken (and it will by dint of reverse engineering), an entirely different cipher needs to be conceived because the previous one relied on security by obscurity. Quite the predicament, isn't it?

Kerckhoff's Principle

A cipher needs to be secure even if everything about it except the key is known.

The reason why the key should be the only unknown variable is that keys are just strings of bits and are thus relatively easy to change in comparison to the other components of a cipher. But in order to be sure that the cipher is as secure as possible, the key must be completely random - no single key should be more likely to be used than any other.

Statistical Tests

And so here comes the question - what is random?

Definition: Randomness

A binary string is random if it was produced by a uniform distribution.

Definition Breakdown

A binary string is random if it was the outcome of a process where all possible outcomes had equal probability of happening.

Okay, but how do we determine that a binary string came from a uniform distribution if we are just given the string and know nothing else about it,, i.e. no one has told us it was obtained from a uniform distribution? This is where statistical tests come in.

Definition: Statistical Test

A statistical test is an algorithm defined as

Definition Breakdown

A statistical test is an attempt to determine if a given binary string was obtained from a uniform distribution.

It is important to notice that since we lack any additional information other than the binary string itself, we can only make certain assumptions about what a uniformly chosen string would look like and see if the given string fits those assumptions. Each statistical test is an assumption which we use in order to try to check if a string was chosen uniformly at random. Since there is no other information, there is no "best" way or "best" statistical test.

Example: Statistical Tests

In a uniformly chosen string one would expect that the number of 0s and the number of 1s are approximately equal, so one possible statistical test is

where is the length of the binary string .

Similarly, one would expect the longest sequence of 1s in a uniformly chosen string to be around and so another possible statistical test would be

These examples illustrate that statistical tests can be pretty much anything and that if we are given no other information about a string other than the string itself, we cannot with certainty determine if it came from a uniform distribution. We can only test the string for properties that we would expect from a uniformly chosen string.

Distinguishers

Statistical tests are often called distinguishers since they attempt to distinguish whether their input came from one distribution or another.

Obtaining Randomness

Cryptography requires randomness and it requires a lot of it, too. However, computers (at least classical ones) are entirely deterministic, so it turns out that randomness is actually quite difficult to come by. For example, a computer might use information from its temperature sensors or from surrounding electromagnetic noise. Nevertheless, these sources can only provide so many random bits and rarely satisfy the needs for randomness at a given time.

So, it would be useful to be able to use these random bits to obtain more random bits, wouldn't it?

Pseudorandomness

There is a caveat to the process of obtaining more randomness via a computer, however. Since classical computers are deterministic, it is not really possible to obtain truly random bits - classical computers cannot really "choose a string from a uniform distribution". Besides, producing longer strings from shorter ones requires generating information - it is like filling in the gaps in some puzzle with missing pieces. Classical computers do not have a way for randomly generating information - they can only obtain it from their surroundings as mentioned previously. But these surroundings can only provide so much randomness. The rest requires an algorithm and an algorithm means a pattern. Therefore, we will have to settle for something that is close enough to random - i.e. the pattern is extremely difficult to detect.

Definition: Pseudorandomness

A string of bits is pseudorandom, if for every efficient statistical test running in time , where is some polynomial, it holds true that

Definition Breakdown

Essentially, a string of bits with length is pseudorandom if there is no statistical test which can distinguish with non-negligible probability between it and a string uniformly chosen from all strings of length . In other words, the difference between the probability that any statistical test classifies a string as random and that it classifies a uniformly chosen string as random should be very very small, i.e. negligible.

Comparing Distributions

Statistical tests provide a way to determine if a string is likely to have been obtained from a uniform distribution. In a sense, they compare a given string with a string from a uniform distribution. Now, this begs the question if statistical tests can be used to compare two distributions? Indeed, they can!

Definition: Computational Indistinguishability

Two distributions and over are -computationally indistinguishable, denoted by , if for every algorithm computable in at most operations, it holds that

Definition Breakdown

One can think of the algorithm as an algorithm which tries to determine if its input was obtained from the distribution or from the distribution , i.e.

Essentially, the definition says that if and are -computationally indistinguishable, then there is no such algorithm which takes steps to run that can differentiate if its input came from or with non-negligible probability. In other words, the algorithm is approximately equally likely to think that any given input came from as it is to believe that it came from , i.e.

The numbers and are parameters. If an algorithm had more time to run, i.e. was a big number, then it could perform more computations and so it is reasonable to expect that it could better distinguish between the two distributions. Just how better is quantified by the number which is the difference in the probabilities that the algorithm thinks an input came from the distribution and that it came from the distribution .

Example

Consider the two distributions over which are -computationally indistinguishable. This means that for any algorithm , which takes steps to complete on an input of length , it is true that the difference in the probability that thinks came from and the probability that thinks came from is at most .

Computational indistinguishability is a way to measure how "close" or "similar" two distributions are, i.e. how different the probabilities they assign to the same string are. It is reasonable to expect that if the distribution is computationally indistinguishable from the distribution and is computationally indistinguishable from the distribution , then is also computationally indistinguishable from . After all, if one thing is close to another thing which is close to a third thing, then the third thing is also close to the first. And indeed, this turns out to be true for computationally indistinguishable distributions!

Theorem: Triangle Inequality for Computational Indistinguishability

If , then .

Theorem Breakdown

If you have a sequence of distrbutions , where adjacent distributions are close to one another, then it makes sense that the first and the last distribution are also close to one another. However, it is still the case that is closer to than it is to which is why and are only indistinguishable and not . The "distance" between and is greater than the distance between and which is why an algorithm running in time in both cases would be a bit better in distinguishing from than distinguishing from , hence why .

Proof: Triangle Inequality for Computational Indistinguishability

Suppose that there is an algorithm running in time such that

The left-hand side can be rewritten as

Therefore,

and hence there must be two distributions and for which

This contradicts the assumption that for all .